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1. Introduction 

Consider the Gaussian design trace regression model 

Yi = tr(X * l 9) + e i; i = l,...,n, (1) 

where e ~ N(Q,I n ) is an i.i.d. vector of Gaussian noise. Here the matrices X 1 are d x d square 
matrices with i.i.d. entries X l mk ~ 7V(0,1), and 9 is the unknown dx d matrix we want to 
make inference on. We are interested in the case where the model dimension d 2 is possibly large 
compared to sample size n, but where 9 has low rank k, in which case we write 9 € R(k ), 1 < k < 
d. This setting serves as a prototype for various matrix inference problems such as those occurring 
in compressed sensing [4] or in quantum tomography [7]. We consider here a high-dimensional 
regime where min (Vi, n) —> oo, reflecting contemporary statistical challenges. 

The first problem we study in this paper is the signal detection problem with low-rank alter¬ 
natives: We want to test the hypothesis 

H 0 : 9 = 0 vs. H ± : 9 ± 0, 9 € R{k), ||0|| > p, 

*This work was carried out when this author was a research associate in the University of Cambridge. 
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where || • || equals either the Frobenius norm || • ||f or the nuclear norm || • ||* (defined in detail 
below), and where p should be the minimal ‘signal strength’ condition for the above hypothesis 
testing problem to have a consistent solution (in the sense of Ingster, see [10]). We will show that 
the minimax optimal detection boundary in Frobenius norm is of the form 



whereas in nuclear norm it is 



A remarkable feature is that for the Frobenius norm the detection rate does not depend at all on 
the complexity of the alternative hypothesis (the rank k), whereas for the nuclear norm it does. 
The phase transition between the two regimes in these rates depends precisely on whether the 
sample size n exceeds the dimension d 2 of the maximal parameter space R(d) or not. The upper 
bounds in our proofs are related to the papers [9, 1] about the detection boundary in the sparse 
regression setting, and our main contribution consists in deriving the matching lower bounds for 
low rank alternatives. 

Our interest in the detection boundary is triggered by the second problem we investigate here: 
the question of existence and non-existence of adaptive confidence sets for low rank parameters. 
It follows from general decision-theoretic principles (see Chapter 8.3 in [6] and also [8, 2]) that the 
answer to this question is closely related to a ‘composite version’ of the detection problem (see 
(15) below). This approach was employed in [14] to prove that adaptive and honest confidence 
sets for the parameter 9 do not exist in sparse regression models if an ^ 2 -risk performance beyond 
0(n -1 / 4 ) is desired. In contrast in the recent paper [5] it was shown that if sparsity constraints 
are replaced by low rank conditions, then adaptive and fully honest confidence sets exist over the 
entire parameter space R(d). Adaptation means here that the expected Frobenius norm diameter 
of the confidence set reflects the minimax risk over arbitrary low rank sub-models R(k), 1 < k < d. 
The fact that the detection rates obtained here in Frobenius norm are independent of the rank 
constraint 9 £ R(k) provides another heuristic explanation of the result in [5]. 

Moreover [5] constructed another confidence set whose diameter adapts to low rank sub-models 
in the stronger nuclear norm distance, and that is honest for all 9' s that are non-negative definite 
and have trace equal to one , that is, whenever 9 is the density matrix of a quantum state. Such 
a constraint on 9 is natural in a quantum physics context considered in [5], but not in general. 
The question arises whether it is essentially necessary or not. In the present paper we show that 
indeed the existence results of [5] are specific to the geometry induced by the Frobenius norm 
or to the quantum state constraint, and that nuclear-norm adaptive and honest confidence sets 
over general low rank parameter spaces do not exist in the model (1). For example, our results 
imply that if one requires coverage of a confidence set over all of R(d) then the worst case nuclear 
norm diameter for rank-one parameters can be off the minimax estimation rate over f?( 1) by as 
much as \fd. Our results thus further illustrate the subtleties involved in the theory of confidence 
sets for high-dimensional parameters, and that the positive results in [5] are of a rather specific 
nature. 

Our proofs are given in the simplest model where both the design and the noise are Gaussian, 
and the matrices involved are of square type. As usual, our results extend without major difficulty 
to sub-Gaussian design and noise, to certain correlated random designs, and also to non-square 
matrices, at the expense of slightly more technical proofs. Generalisations of our results to the 
matrix completion problem are currently under investigation. 
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2. Main results 

2.1. Notation 

We write for the set of d x d matrices with real elements. If X : M<j —> M” denotes the 
‘sampling operator’ 

9^X9 = (tr(X 1 0),..., ti(X n 6)) T , 
then the model (1) can be written as 

Y = X9 + e, 

where Y = (Yi,... ,Y n ) T and e = (ei,..., e n ) T . We write E x for the expectation over the 
distribution of X only, and Eg for the expectation conditional on X. The full expectation is 
denoted by Eg = E x Eg. The corresponding probability laws are denoted by P x . Pg.Vg and we 
employ the usual o/O/op/Op -notation with min(n, d) —> oo. 

We denote the standard norm on Euclidean space by || • || 2 , and the associated inner product 
by (-,-) 2 . Let ||.||f be the Frobenius norm over M^, i.e. 

\\M\\ F = y/tr(M T M) = 

V 0< d 

where Xf are the eigenvalues of M T M. The associated inner product is 

(U,V) F =tr(U T V). 

We also define the nuclear norm of M as 

l|M||* = £N- 

j<d 

These two norms are in fact defined also for matrices that are not of square type. Finally we 
recall that for any matrix M £ R(k), we have 

||M|| f < ||M||* < Vk\\M\\p. 

2.2. Signal detection for low rank alternatives 

We consider first the following hypothesis testing problem, also known as the signal detection 
problem: 

Hq : 9 = 0 vs. H ± : d G R(k),\\d\\> p. (2) 

Here the alternative space is restricted to a ‘low rank’ hypothesis 6 £ R[k) for some 1 < k < d. 
Moreover, for a separation constant p > 0, the detection boundary is described by a ‘signal 
strength’ condition measured in terms of the size ||0|| > p of the Frobenius-, or of the nuclear 
norm of 9. In the high-dimensional regime where min(n,d) —> 00 , we want to find the minimal 
sequence p = p nt d such that for any a > 0 a level a-test '3> = T(Y, X, a) exists: 

Eq[^] + sup Eg[l — T] = P 0 ( re ject Hq) + sup Pg(accept 7L 0 ) < ot. (3) 

J eeffi 

Recall that a test is simply a random indicator function ip = 1 a where the rejection event A 
depends only on Y, X , a, and we require the sum of the type-one and the type-two error of the 
test to be controlled at any fixed level a > 0. 
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Theorem 1 Consider the testing problem (2) with norm || • ||. Define 

= [min (y /djn, n ~ i/4 ) if || • || = || • ||f 

n ’ d [min (y/kd/n, \fk/n 1/4 ) if || • || = || • ||*. 


1) Suppose p > Dr n d- Then for every a > 0 there exists a test ’F = ^(T, X,a) and finite 
constants D = D a >0,n Q eN such that (3) holds for every n> n a . 

2) Conversely, suppose p = o(r n ^d) and k = o(d) as min(n, d) oo. Then no test satisfying 
(3) for every a > 0 exists. In fact 


lim inf inf 

n,d ^ 


Eo[\F] + sup E e [l — H/] 

6&H i 


> 1 


where the infimum extends over all test functions = 'S>(Y,X). 


( 4 ) 


The tests 'P constructed in the proof are given in (9) below and straightforward to implement. 
Note also that the || • ||*-separated alternatives are a subset of the || • H^-separated alternatives 
(see (10) below), and our results imply that an optimal test for the case || • || = || • ||f is essentially 
optimal also for || • ||*. 


2.3. Confidence sets for low rank recovery 

Low rank recovery algorithms are well-studied in compressed sensing and high-dimensional statis¬ 
tics, see e.g., [4, 7, 11, 12, 13, 3] and the references therein. In the setting of model (1) they provide 
minimax optimal estimators 8 of 9 £ R(k) with (high probability) performance guarantees 

kd ~ l~d 

\\e-ef F <-, \\e-e\u<k^-. (5) 

The question we study here is whether associated uncertainty quantification methodology exists, 
that is, whether we can find confidence sets C n C M,; such that 

inf P g(9 € C n ) > 1 — a, (6) 

at least for min (n,d) large enough, and such that the diameter \C n \ of C n reflects the accuracy 
of adaptive estimation in the sense that \C n \ shrinks, with high probability, at the optimal rates 
from (5) whenever 8 € R(k). We insist here on an adaptive confidence set that does not require 
knowledge of the unknown rank k of 9. 

A first result that is proved in the paper [5] is that such adaptive confidence sets do exist in 
the model (1) if the diameter is measured in Frobenius distance. The construction of this set is 
straightforward, see [5] for details. 

Theorem 2 (Theorem 2 in [5]) For every a > 0 there exists a confidence setC n = C n (Y,X,a) 
such that for all n € N, (6) holds, and such that uniformly in 8 £ R(ko) for any 1 < ko < k , with 
high Fg-probability the Frobenius-norm diameter IC^If of C n satisfies 

\C n \f ^5 \ ko~■ 

V n 
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A second result that is proved in the paper [5] is that an (asymptotic) adaptive confidence 
set exists also in nuclear norm provided that the “quantum state constraint” is satisfied, namely, 
provided it is known a priori that 9 is non-negative definite and has nuclear norm one, and 
provided the coverage requirement in (6) is relaxed to hold only over a maximal model R(k) in 
which asymptotically consistent estimation of 9 is possible (i.e., k\fdjn = o(l)). Define 

R + (k) = R(k) fl {9 is non-negative definite, tr(0) = 1}, 

the set of quantum state density matrices of rank at most k. 

Theorem 3 (Theorem 4 in [5]) Assume k\Jd/n = o(l) for some 1 < k < d, and let a > 0 be 
given. Then there exists a confidence set C n = C n (Y,X,a) such that 

liminf inf ¥g(9 G C n ) > 1 — a, 

min(n,d)—>-oo 0£i?+(fc) 

and such that uniformly in 9 € R + (kg) for any 1 < fco < k, with high ¥g -probability the nuclear 
norm diameter \C n \* of C n satisfies 


\C n \* < fc 0 



In fact it is not difficult to generalise the above theorem to the case where the condition 
tr(9) = 1 is relaxed to ||0||* < 1. 


The next theorem, which is the main result of this subsection, implies that no analogue of 
Theorem 2 can hold true if the Frobenius norm there is replaced by the nuclear norm, and it also 
shows that Theorem 3 cannot hold true if R + (k) is replaced by R(k), that is, if the ‘quantum 
state constraint’ is relaxed. More precisely, we show that if a confidence set C n is required to have 
coverage over the maximal model R(k±), then the worst case expected nuclear norm diameter of 
C n over arbitrary sub-models R(ko),ko = o(fci), depends on the maximal model dimension k\ 
and does not improve as ko 4- 1- The proof of Theorem 4 is based on Part 2) of Theorem 1 and 
lower bound techniques for adaptive confidence sets from [8, 2]. 

Theorem 4 Let k\ —> oo such that k\ = o(d) as min(n, d) —> oo. Suppose that for any 0 < a < 
1/3 the confidence set C n = C n (Y,X,a) is asymptotically honest over the maximal model R(k\), 
that is, it satisfies 

liminf inf P g(9 £ C n ) > 1 — a. (7) 

min(n,fZ)—Kx> 0(zR(ki) 

Then for every kg = o(k\) and some constant c > 0 depending on a, we have 


sup E e |C n |»>cW— (8) 

9eR(k 0 ) V n 

for every min(n, d) large enough. In particular no confidence set exists that is honest over all of 
Md and that adapts in nuclear norm to any model R(ko),ko = o(y/d). 

For notational simplicity we have lower bounded the expected diameter \C n \* in (8), but the 
proof actually contains a stronger ‘in probability version’ of this lower bound. 
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Remark 1 A few remarks on Theorem ) are in order: 

i) In the least favourable case where one wants coverage over the entire R(d ) = while still 
adapting to rank-one matrices (i.e., ko = 1), the performance of any honest confidence set is off 
the minimax optimal adaptive estimation rate \fdfn over R( 1) by a diverging factor that can be 
as close to \/d as desired. 

ii) Even if one restricts coverage to hold only for ‘consistently estimable models’ R{k{) with 
k\\Jd/n —> 0 (as in Theorem 3), the diameter \C n \* can be off the minimax rate of estimation 
over R( 1) by a factor of \fk\. 

in) We also note that the above result does not disprove the existence of adaptive confidence 
sets for sub-models R(ko) of ‘moderate rank’ where ko > \fd. While more of technical interest 
- note that this rules out n < d 2 for consistent recovery to be possible - this regime currently 
remains open (it is related to the apparently hard problem of finding optimal separation rates in 
the composite testing problem (15) below). 

3. Proofs 

3.1. Proof of Theorem 1, upper bounds 

When n < d 2 then define 

r n = -\\Y\\l - 1, r n = n~ 1/2 
n 

but when n> d 2 set 

— , n _ -|X T ™ = ^/ n - 

^ i<j l<m<d,l<k<d 

The test statistic is 

'kn = 1 {r n > z a T n } (9) 

where z a are quantile constants chosen below. 

These tests work for Frobenius norm separation, by effectively the same proofs as in [9] , using 
that we can embed the matrix regression model into a vector regression model with p = d 2 
parameters, and since the separation rates only depend on the model dimension (and not on low 
rank or sparsity degrees). However, to provide intuition, we give some details, first for the case 
n < d 2 : Under Ho we have Y = e and so 

E 0 ^„ = Pr ~ Ee i) > Za j - a / 2 

for every n £ N and z a large enough (using either Chebyshev’s inequality and Eef = 3, or 
Theorem 4.1.9 in [6] for a more precise non-asymptotic bound). Now for the alternatives 9 € Hi 
we use the basic concentration result Lemma la) in [5] which implies that for any fixed 9 the 
event 

£={\d/n)\\X9\\ 2 -\\9\\ 2 F \< \\9\\ 2 f /2) 
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has P x -probability at least 1 — 2exp(—n/24), and so, for n > n a such that 2exp(—n/24) < a/ 6 , 
E 0 (l - \H n ) = P e (r n < z a T n ) 

= Pr (—\\ x 0 + e lll — ! < ~J= 

\n yjn 

= Pr (-\\ xd \\l “ - 7 = < -~e T X9- - 1) 

\ n yn n n 


< 


Pr (M - i| < -2^*0 - - £>? - 1),«) + 2exp(—n/24) 

V 2 n n i~i ) 

> \\0\\f/8,£^ + Pl ' ~ Ee i) > z <*/3 j +a/6 


2 ~ 

-e T A<9 

n 


since, by the hypothesis on p, we have for D large enough that 


11*11 


> 


11011 


> 


2 z r 


/3 


yfn 4 n 1 / 2 


The last probability is bounded by a /6 as under Hq and the last but one probability is also 
bounded by a /6 by a direct (conditional on X) Gaussian tail inequality (restricting to the event 
£: just as in term II of the proof of Theorem [5] with 9 = 0 there), so that in total we have 
bounded the testing errors in (3) by a/2 + (3/6)a = a, as desired. The case n> d 2 follows from 
similar but slightly more technical arguments, adapting the arguments from proof of Theorem 3 
in [5], or arguing directly as in Theorem 4.3 in [9] with p = d 2 . 


The test (9) also works for nuclear-norm separation since 


Hi = 9 € R(k) : || 0 ||* > cVkp 


is a subset of 

H[ = 9 G R(k) : ||0||f > cp 

in view of the inequality 

||0||f > (1/Vfc)||0||. V 0 e R(k), 

so that 

Eo^n + sup E e (l - \H n ) < Eo^n + sup E e (l 
We now turn to the more difficult lower bounds. 


< a. 


( 10 ) 


3.2. Proof of Theorem 1, lower bounds 


Let be any test - any measurable function of Y. X that takes values in {0,1}. Assume p = 
o(r ni d) as min(n,d) 00 and let H 1 = H\{p) be the corresponding alternative hypothesis. 


Step I: Reduction to averaged likelihood ratios: Let 7 r = 7 r„ i( / be a sequence of finitely supported 
probability distributions on such that Tr n ,d{Hi) —> 1, and denote by tt\Hi that measure 
restricted to Hi and re-normalised to unit mass. Define 


Z = 


E<w P 


dP 


( 8 ) 


i<n 


dP. 


(0) 



dR 


(8) 


dP, 


( 0 ) 


dn(6), 
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where dPf' 1 is the distribution of Yf\X when the parameter generating the data is 9 , and dPf' 1 
is the distribution of Yi\X when the parameter generating the data is 0. Then, by a standard 
testing lower bound (e.g., (6.23) in [ 6 ]), for any y > 0, 


E 0 T + sup E 0 (l - tf) > Eof + Eg^ Hl Eg(l - T) 

9&H i 


> Eo'S + E 0 ^ r E e (l - tf) - o(l) 

= E x [E 0 + Eg^Eg{ 1 - f)] - o(l) 


> (i - v) 


1 - 


y/EpjZ-l)* 

9 


-o(l). 


Now since 

E 0 [Z-1] 2 =Eo[Z 2 }-1, 

if we show that Eo[Z 2 ] < 1 + o(l) as min(n,d) —> oo for a suitable choice of tt, then the lower 
bound (4) will follow by letting 77 —>• 0. Recall the notation Eg = E x Eg. 


Step II: Computation of Eq[Z 2 ]: The (Yf) are independent with distribution J\f((X9)i, 1) con¬ 
ditional on the design X 1 hence 


Z = Eg^ n 

= Eg„ n 


tt exp(-lfa - ( X9)j ) 2 ) 
II nvn/'_ 


i<n 


eM-bf) 


7 exp (yi{X0)i) exp(--((T’6») i ) 2 ) 


i<r, 


and can hence write 


Eo[Z 2 ] = j fE e ^[]^[exp(y i (A’6l) i )exp(-i((T’6i) i ) 2 )j'j JQ A= exp(-^-)dyi...dy, 


[ (eb^ I"exp(—i||<Y 0 ||!) ]^[ exp(2/i(T0)il^ -f= exp(-^-)dyi...dy n . 
■'«" V i<n ) i<n V 2 tT i 


Thus, if 9 , 9' are independent copies of joint law 7 r , then we have 
E 0 [Z 2 } = y'^E w 2 [exp(-i(||T’0|| 2 -i(||^'|| 2 )n4^exp(y l (A’(0 + 0')) i -f) n 

^ i<n * 


dyi...dy n 


= E. 


exp(-i||T7?|| 2 - \\\X9'\\l) 


n / ( - 7 = ex P ^-\(yi-( x {0 + 0, ))*) 2 )%:exp Q(X(0 + 6 »'))?) 

i<n y i v 


= E_ 


exP ( |ll^(«» + Olll - |ll^lll - |ll^'lli 


Step III: Integrating over X: The E^ -expectation of the last expression can be bounded by 


E„ 


exp g(||0 + 9 '||| - \\9\\ 2 f - ||0 , |||O) E x exp f^Z, -Z 2 - Z 3 ) 
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where 

Z t = \\X'&i\^-n\\'di\\ 2 F , with i) 1 = 0 + 0',i) 2 = 0,i) 3 =0'. 

The last factor can be bounded, by applying the Cauchy-Schwarz inequality twice, by 

(- E x exp {Z 1 )) 1 / 2 {E X exp(2 Z 2 )) 1/4 (E X exp(2Z 3 )) 1/4 . 


( 11 ) 


Since Xi)g ~ N( 0, ||^|||^/„) the distribution of Zf is the one of X)"=i(s i i ~ 1) w h ere the gi 
are i.i.d. iV(0,1). Applying Theorem 3.1.9 in [ 6 ] with t, = 1 and A = ||$i||p or A = 2||f^|||,,£ = 
2,3, (and hence setting ||A|| = 1, ||A||jj,s = n in that theorem) we see that if max^ ||^|||, < 1/4 
then 


E x exp(Zi) < exp 


\1 - 2 ||^i|||/ ’ 


and 


E x exp(2 Ze) < exp 


( MH\% \ 

Vl-4||^||U 


£ = 2,3. 


As a consequence if 

fci a 2 X 3 = °( n_1/4 ) (12) 

then the the product (11) is bounded above by 1 + o(l). We conclude that if the prior n satisfies 
( 12 ) almost surely then 


E 0 [Z 2 } = E X E 0 [Z 2 ] < (1 + o(l)) x exp Q (||0 + O'\\ 2 F - \\0\\ 2 F - ||0'|||)) 

= (1 + o(l)) x E n 2 exp (n(0, 0') F ). 


Step IV: Construction of n and bounds for Eo[Z 2 ]: Assume for notational simplicity that 
d is an integer multiple of k, the general case needs only minor notational adjustment. Pick 
independent random d x 1 vectors vt : £ = 1,... ,k each of which consists of i.i.d. Rademacher 
entries (i.e., taking values ±1 with probability 1/2). Create a matrix W as follows: In the first 
d/k columns insert Vi times a random sign B\ d ,j = 1,... ,d/k. Then, in the £-th block repeat 
the same with v\ replaced by vi, and random signs B^ d ,j = 1,... ,d/k. If || • || = || • ||f let 
7 „ = p n /d and if || ■ || = || ■ ||* set = 2 p n /(y/kd), so that in either case 

7 n = o (mm(\/l/dn, d _1 n _1 / 4 )^ . 

Define the random matrix 0 = 7 n W and let O' be an independent copy of it. Thus 

k d d/k 

n{0, 0') F = nj 2 J2J2Yl v IrnB ed v^ m B' ej = 717 2 ^ ^ v^ m v^ m ^ B ed B' £ j . 

i—\ m=l j—1 t m j 

As products of Rademacher variables are again Rademacher variables we have, for e^ ?m , 6£j 
i.i.d. Rademacher variables (all defined on a suitable product probability space), 


E n 2 exp (n(0, 0') F ) 


E e E^ex p I n 7^ EE Q,m ^ ^ €£,j 


£ m 


E e Ei exp n'y 2 ^ ^ 


Q, 


(13) 
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Conditional on the values of e we set A = n^ 2 X^m=i and note that 

|A| < nd'yl = o( 1). 

By Taylor expansion or standard properties of the hyperbolic cosine (as, e.g., in the proof of 
Theorem 6.2.9 in [ 6 ]) 

/ d / k \ 

E^ex p I A^ eej I = cosh(A 2 ) d,/fe < exp (A 2 d//c) 

and thus, since [EU] k < E[U k ] for any non-negative random variable U , the right hand side in 
(13) is bounded above by 


( E e exp (A 2 d/k)) k < E e exp (A 2 d) = E e exp ^n 2 "f^d ^ = E exp (Z 2 /c 2 ) 

where the Rademacher sum Z = 1 e m is a sub-Gaussian random variable with variance 

proxy a 2 = d (cf. Section 2.3 in [ 6 ]). Thus by (2.24) in [ 6 ] we have 


E exp (Z 2 /c 2 ) < 1 + 


since 


c 2 / 2 cr 2 - 1 
1 


= l + o(l) 


00 


cr 2 d 2 n 2 7 ^ 

as n, d —> 00 . Summarising all steps so far we conclude 

0 < E X E 0 [Z - l ] 2 = E[Z 2 ] - 1 < 1 - 1 + o(l) = o(l) 
noting that (12) holds 7 r-almost surely in view of 

\\e\\ 2 F = 7 2 jwf F = 7 2 n d 2 = o(n- 1 / 2 ). 


Step V: Asymptotic concentration of n on Hi: Finally we show that for the above prior we 
have indeed n(Ui) —> 1. First since 6 consists of columns that are linear combinations of at 
most k distinct vectors Vi we immediately have 0 £ R(k) almost surely. Moreover, for the case 
|| • || = || • ||_f we have from the last display and by definition of that || 0 |||- = p 2 , so n(lii) = 1 
follows. 

For the case || • || = || • ||* we have to show that 

7+,(i(||$||* A pn) y 1 

as min(n,<i) — > 00 . We can transform 8 into the dx k matrix 9U consisting of k column vectors 
7 n\J dfkvn^i = 1 ,..., k. The corresponding dx k matrix U consists of k column vectors, the £-th 
of which has zero entries except for the indices m £ [ M/k ,..., — 1 + (£ + 1 )d/k\, where it equals 
\Jk jdBgm. Thus, U is an orthonormal projection matrix and we deduce that 
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We can renormalise the column vectors of 6U so that 

eu = ln 7k ("'7f "') ~ ln vf' 

The dxk matrix V consists of scaled i.i.d. Rademacher entries, and hence the proof of Lemma 1 
in [14] (with n = d, k = k\ = p in the first display on p.2868 there) implies that, if k/d -A- 0, then 
with probability as close to one as desired, the smallest singular value of V is bounded below by 
1/2 for d large enough. As a consequence ||U||* > k/2 and so, with probability approaching one, 


||0||* > ^ndVk/2 = p n 


Note that the same lower bound holds for 


\\6 - i2(fc 0 )||* 


inf || 9 

B’GR(ko) 


k 

e%> J2 \^\>{k-ko)/2 

j=k 0 + l 


(14) 


for any ko < k, if the absolute eigenvalues in the last display are assumed to be in decreasing 
order. 


3.3. Proof of Theorem 4 


Consider the composite testing problem 

Hq : 9 e R{k 0 ) vs m-.ee R{k i), ||0 - R(k 0 ) ||* = inf ||0 - 0'||„ > p. (15) 

B'£R(k 0 ) 

From (14) with k = k\ and fco = o(fci) we see that for min(n,d) large enough such that 
(fci — fco)/2 > fci/4, the prior 7r from the previous proof with 7 „ = 4 p n /{\fkd) asymptoti¬ 
cally concentrates on H /. As a consequence testing (15) is no easier than when Ho = {0}, so 
that when p = o(y / fcid/n) then the proof of Part 2 of Theorem 1 implies 


liminf inf sup + sup Eg(l — T) 
n . d i 1 bgh 0 


> 1. 


(16) 


Now assume by way of contradiction that there exists C n that satisfies (7) with a < 1/3 and 
such that for every c > 0 there exist infinitely many n, d such that 

sup E^Cnl* < cy/kid/n. 

8eH 0 


Passing to the infinite subsequence min(n, d) —> oo along which the last inequalities hold, we 
deduce from Markov’s inequality that 

sup PeflC'nl* > ay/kid/n) < c/a < a 

G^R(ko) 


for c small enough depending only on a. Then, by Proposition 8.6.3 in [6] we can construct a 
test for (15) for which the testing errors in (16) are no more than 3a < 1 along the chosen 
subsequence, a contradiction that completes the proof. 
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